{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "machine_shape": "hm"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "gpuClass": "standard",
    "accelerator": "TPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# 转换并量化中文LLaMA/Alpaca模型\n",
        "\n",
        "🎉🎉🎉 **新：现在免费用户也有机会能够转换7B和13B模型了！**\n",
        "\n",
        "💡 提示和小窍门：\n",
        "- 免费用户默认的内存只有12G左右，**笔者用免费账号实测选择TPU的话有机会随机出35G内存**，建议多试几次。如果能随机出25G内存以上的机器就可以了转换7B模型了，35G内存以上机器就能转换13B模型了\n",
        "- Pro(+)用户请选择 “代码执行程序” -> “更改运行时类型” -> “高RAM”\n",
        "- 实测：转换7B级别模型，25G内存的机器就够了；转换13B级别模型需要30G以上的内存（程序莫名崩掉或断开连接就说明内存爆了）\n",
        "- 如果选了“高RAM”之后内存还是不够大的话，选择以下操作，有的时候会分配出很高内存的机器，祝你好运😄！\n",
        "    - 可以把GPU或者TPU也选上（虽然不会用到）\n",
        "    - 选GPU时，Pro用户可选“高级”类型GPU\n",
        "\n",
        "以下信息配置信息供参考（Pro订阅下测试），运行时规格设置为“高RAM”时的设备配置如下（有随机性）：\n",
        "\n",
        "| 硬件加速器  |  RAM  |  硬盘  |\n",
        "| :-- | :--: | :--: |\n",
        "| None | 25GB | 225GB |\n",
        "| TPU | 35GB | 225GB |\n",
        "| GPU（标准，T4）| 25GB | 166GB |\n",
        "| GPU（高性能，V100）| 25GB | 166GB |\n",
        "| GPU（高性能，A100）| **80GB** | 166GB |\n",
        "\n",
        "*温馨提示：用完之后注意断开运行时，选择满足要求的最低配置即可，避免不必要的计算单元消耗（Pro只给100个计算单元）。*"
      ],
      "metadata": {
        "id": "B1c96_k3MahN"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 安装相关依赖"
      ],
      "metadata": {
        "id": "vScqHD_jMFOV"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "E5WKFJXIL6ZU",
        "outputId": "7ce317e5-c105-49a8-d1af-70c29e6246e1"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting transformers\n",
            "  Downloading transformers-4.28.0-py3-none-any.whl (7.0 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.0/7.0 MB\u001b[0m \u001b[31m54.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers) (2.27.1)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (1.24.2)\n",
            "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers) (6.0)\n",
            "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers) (4.65.0)\n",
            "Collecting huggingface-hub<1.0,>=0.11.0\n",
            "  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m200.1/200.1 kB\u001b[0m \u001b[31m24.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (2022.10.31)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers) (3.11.0)\n",
            "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n",
            "  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m97.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (23.0)\n",
            "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (4.5.0)\n",
            "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2.0.12)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2022.12.7)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (1.26.15)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (3.4)\n",
            "Installing collected packages: tokenizers, huggingface-hub, transformers\n",
            "Successfully installed huggingface-hub-0.13.4 tokenizers-0.13.3 transformers-4.28.0\n",
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting peft\n",
            "  Downloading peft-0.2.0-py3-none-any.whl (40 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.3/40.3 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: psutil in /usr/local/lib/python3.9/dist-packages (from peft) (5.9.4)\n",
            "Requirement already satisfied: transformers in /usr/local/lib/python3.9/dist-packages (from peft) (4.28.0)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.9/dist-packages (from peft) (6.0)\n",
            "Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.9/dist-packages (from peft) (2.0.0+cu118)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from peft) (23.0)\n",
            "Collecting accelerate\n",
            "  Downloading accelerate-0.18.0-py3-none-any.whl (215 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m215.3/215.3 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from peft) (1.24.2)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (1.11.1)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.1.2)\n",
            "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (2.0.0)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.1)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.11.0)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (4.5.0)\n",
            "Requirement already satisfied: lit in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.13.0->peft) (16.0.1)\n",
            "Requirement already satisfied: cmake in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.13.0->peft) (3.25.2)\n",
            "Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (0.13.4)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (2022.10.31)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (2.27.1)\n",
            "Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (0.13.3)\n",
            "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (4.65.0)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.9/dist-packages (from jinja2->torch>=1.13.0->peft) (2.1.2)\n",
            "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (2.0.12)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (3.4)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (1.26.15)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (2022.12.7)\n",
            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.9/dist-packages (from sympy->torch>=1.13.0->peft) (1.3.0)\n",
            "Installing collected packages: accelerate, peft\n",
            "Successfully installed accelerate-0.18.0 peft-0.2.0\n",
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting sentencepiece\n",
            "  Downloading sentencepiece-0.1.98-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m18.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: sentencepiece\n",
            "Successfully installed sentencepiece-0.1.98\n"
          ]
        }
      ],
      "source": [
        "!pip install transformers\n",
        "!pip install peft\n",
        "!pip install sentencepiece"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 克隆目录和代码"
      ],
      "metadata": {
        "id": "ygb1xFIMNQKw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca\n",
        "!git clone https://github.com/ggerganov/llama.cpp"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yCEJh7NJNXz9",
        "outputId": "91a0e4ff-af63-4f8e-ab82-ee4ddf583033"
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Cloning into 'Chinese-LLaMA-Alpaca'...\n",
            "remote: Enumerating objects: 559, done.\u001b[K\n",
            "remote: Counting objects: 100% (129/129), done.\u001b[K\n",
            "remote: Compressing objects: 100% (115/115), done.\u001b[K\n",
            "remote: Total 559 (delta 30), reused 22 (delta 14), pack-reused 430\u001b[K\n",
            "Receiving objects: 100% (559/559), 10.71 MiB | 25.49 MiB/s, done.\n",
            "Resolving deltas: 100% (333/333), done.\n",
            "Cloning into 'llama.cpp'...\n",
            "remote: Enumerating objects: 1701, done.\u001b[K\n",
            "remote: Counting objects: 100% (1701/1701), done.\u001b[K\n",
            "remote: Compressing objects: 100% (620/620), done.\u001b[K\n",
            "remote: Total 1701 (delta 1084), reused 1623 (delta 1047), pack-reused 0\u001b[K\n",
            "Receiving objects: 100% (1701/1701), 1.86 MiB | 14.74 MiB/s, done.\n",
            "Resolving deltas: 100% (1084/1084), done.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 合并模型（以Alpaca-7B为例）\n",
        "\n",
        "**⚠️ 再次提醒：7B模型需要25G内存，13B模型需要35G+内存。**\n",
        "\n",
        "此处使用的是🤗模型库中提供的基模型（已是HF格式），而不是Facebook官方的LLaMA模型，因此略去将原版LLaMA转换为HF格式的步骤。\n",
        "\n",
        "**这里直接运行第二步：合并LoRA权重**，生成全量模型权重。可以直接指定🤗模型库的地址，也可以是本地存放地址。\n",
        "- 基模型：`decapoda-research/llama-7b-hf` *（use at your own risk）*\n",
        "- LoRA模型：`ziqingyang/chinese-alpaca-lora-7b`\n",
        "\n",
        "💡 转换13B模型提示：\n",
        "- 请将参数`--base_model`和`--lora_model`中的的`7b`改为`13b`即可\n",
        "- **免费用户必须增加一个参数`--offload_dir`以缓解内存压力**，例如`--offload_dir ./offload_temp`\n",
        "\n",
        "该过程比较耗时（下载+转换），需要几分钟到十几分钟不等，请耐心等待。\n",
        "转换好的模型存放在`alpaca-combined`目录。\n",
        "如果你不需要量化模型，那么到这一步就结束了。"
      ],
      "metadata": {
        "id": "nIyxX0DSNsgQ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!python ./Chinese-LLaMA-Alpaca/scripts/merge_llama_with_chinese_lora.py \\\n",
        "    --base_model 'decapoda-research/llama-7b-hf' \\\n",
        "    --lora_model 'ziqingyang/chinese-alpaca-lora-7b' \\\n",
        "    --output_dir alpaca-combined"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5AV4EW5hNhVV",
        "outputId": "e34419d4-b7c9-4e22-af37-abf80d4163ba"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "2023-04-14 10:13:45.382526: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "Downloading tokenizer.model: 100% 758k/758k [00:00<00:00, 12.7MB/s]\n",
            "Downloading (…)cial_tokens_map.json: 100% 96.0/96.0 [00:00<00:00, 15.3kB/s]\n",
            "Downloading (…)okenizer_config.json: 100% 166/166 [00:00<00:00, 63.2kB/s]\n",
            "Downloading (…)lve/main/config.json: 100% 427/427 [00:00<00:00, 63.4kB/s]\n",
            "Downloading (…)model.bin.index.json: 100% 25.5k/25.5k [00:00<00:00, 9.41MB/s]\n",
            "Downloading shards:   0% 0/33 [00:00<?, ?it/s]\n",
            "Downloading (…)l-00001-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 95.1MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 155MB/s] \u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  31% 126M/405M [00:00<00:01, 205MB/s] \u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  39% 157M/405M [00:00<00:01, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  47% 189M/405M [00:00<00:01, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  54% 220M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  62% 252M/405M [00:01<00:00, 214MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  70% 283M/405M [00:01<00:00, 215MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  78% 315M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  85% 346M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  93% 377M/405M [00:01<00:00, 214MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin: 100% 405M/405M [00:01<00:00, 205MB/s]\n",
            "Downloading shards:   3% 1/33 [00:02<01:07,  2.11s/it]\n",
            "Downloading (…)l-00002-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 91.8MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 150MB/s] \u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  28% 115M/405M [00:00<00:01, 192MB/s] \u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  34% 136M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  39% 157M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  44% 178M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  60% 241M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  80% 325M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  85% 346M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
            "Downloading shards:   6% 2/33 [00:04<01:07,  2.17s/it]\n",
            "Downloading (…)l-00003-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 90.9MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 143MB/s] \u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 166MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  28% 115M/405M [00:00<00:01, 190MB/s] \u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  34% 136M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  39% 157M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  44% 178M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  49% 199M/405M [00:01<00:01, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  54% 220M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  75% 304M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  80% 325M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  85% 346M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin: 100% 405M/405M [00:02<00:00, 187MB/s]\n",
            "Downloading shards:   9% 3/33 [00:06<01:06,  2.23s/it]\n",
            "Downloading (…)l-00004-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  44% 178M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  49% 199M/405M [00:01<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  54% 220M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  60% 241M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  65% 262M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  70% 283M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  80% 325M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  85% 346M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin: 100% 405M/405M [00:02<00:00, 195MB/s]\n",
            "Downloading shards:  12% 4/33 [00:08<01:04,  2.22s/it]\n",
            "Downloading (…)l-00005-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 86.5MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 167MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  28% 115M/405M [00:00<00:01, 189MB/s] \u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  34% 136M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  39% 157M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  44% 178M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  49% 199M/405M [00:01<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  54% 220M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  60% 241M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  65% 262M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  70% 283M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  75% 304M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  80% 325M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  85% 346M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  91% 367M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin: 100% 405M/405M [00:02<00:00, 188MB/s]\n",
            "Downloading shards:  15% 5/33 [00:11<01:03,  2.26s/it]\n",
            "Downloading (…)l-00006-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 90.3MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 150MB/s] \u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  28% 115M/405M [00:00<00:01, 190MB/s] \u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  34% 136M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  39% 157M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  44% 178M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  49% 199M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  54% 220M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  70% 283M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  75% 304M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  85% 346M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  91% 367M/405M [00:01<00:00, 191MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin: 100% 405M/405M [00:02<00:00, 190MB/s]\n",
            "Downloading shards:  18% 6/33 [00:13<01:01,  2.26s/it]\n",
            "Downloading (…)l-00007-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.4MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  28% 115M/405M [00:00<00:01, 198MB/s] \u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  49% 199M/405M [00:01<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  54% 220M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  65% 262M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  70% 283M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  75% 304M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  91% 367M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
            "Downloading shards:  21% 7/33 [00:15<00:58,  2.24s/it]\n",
            "Downloading (…)l-00008-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.4MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  34% 136M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  54% 220M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  60% 241M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  75% 304M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  80% 325M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  91% 367M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
            "Downloading shards:  24% 8/33 [00:17<00:55,  2.22s/it]\n",
            "Downloading (…)l-00009-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 89.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  44% 178M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  49% 199M/405M [00:01<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  54% 220M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  60% 241M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  65% 262M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  70% 283M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  80% 325M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  85% 346M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
            "Downloading shards:  27% 9/33 [00:20<00:53,  2.22s/it]\n",
            "Downloading (…)l-00010-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.7MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  28% 115M/405M [00:00<00:01, 196MB/s] \u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  34% 136M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  39% 157M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  44% 178M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  52% 210M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  57% 231M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  62% 252M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  67% 273M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  73% 294M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  78% 315M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  83% 336M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  88% 357M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  93% 377M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
            "Downloading shards:  30% 10/33 [00:22<00:50,  2.22s/it]\n",
            "Downloading (…)l-00011-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.9MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 183MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  44% 178M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  49% 199M/405M [00:01<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  54% 220M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  60% 241M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  80% 325M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  85% 346M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin: 100% 405M/405M [00:02<00:00, 195MB/s]\n",
            "Downloading shards:  33% 11/33 [00:24<00:48,  2.21s/it]\n",
            "Downloading (…)l-00012-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 86.7MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 143MB/s] \u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 165MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  28% 115M/405M [00:00<00:01, 189MB/s] \u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  34% 136M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  39% 157M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  44% 178M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  49% 199M/405M [00:01<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  54% 220M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  60% 241M/405M [00:01<00:00, 191MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  65% 262M/405M [00:01<00:00, 191MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  70% 283M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  75% 304M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  80% 325M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  85% 346M/405M [00:01<00:00, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  91% 367M/405M [00:01<00:00, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin: 100% 405M/405M [00:02<00:00, 186MB/s]\n",
            "Downloading shards:  36% 12/33 [00:26<00:47,  2.25s/it]\n",
            "Downloading (…)l-00013-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.5MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 151MB/s] \u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  28% 115M/405M [00:00<00:01, 197MB/s] \u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  39% 157M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  44% 178M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  49% 199M/405M [00:01<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  54% 220M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  75% 304M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  80% 325M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  85% 346M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  91% 367M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin: 100% 405M/405M [00:02<00:00, 195MB/s]\n",
            "Downloading shards:  39% 13/33 [00:28<00:44,  2.23s/it]\n",
            "Downloading (…)l-00014-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:   3% 10.5M/405M [00:02<01:50, 3.56MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:   5% 21.0M/405M [00:04<01:10, 5.46MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:   8% 31.5M/405M [00:04<00:50, 7.45MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  10% 41.9M/405M [00:05<00:37, 9.67MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  13% 52.4M/405M [00:06<00:29, 12.1MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  16% 62.9M/405M [00:06<00:22, 14.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  18% 73.4M/405M [00:06<00:18, 17.6MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  21% 83.9M/405M [00:07<00:16, 20.0MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  23% 94.4M/405M [00:07<00:14, 21.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  26% 105M/405M [00:07<00:12, 23.5MB/s] \u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  28% 115M/405M [00:08<00:11, 24.7MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  31% 126M/405M [00:08<00:10, 25.6MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  34% 136M/405M [00:09<00:10, 26.3MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  36% 147M/405M [00:09<00:09, 26.8MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  39% 157M/405M [00:09<00:09, 27.1MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  41% 168M/405M [00:10<00:08, 27.4MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  44% 178M/405M [00:10<00:08, 27.6MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  47% 189M/405M [00:10<00:07, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  49% 199M/405M [00:11<00:07, 27.8MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  52% 210M/405M [00:11<00:07, 27.8MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  54% 220M/405M [00:12<00:06, 27.8MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  57% 231M/405M [00:12<00:06, 27.8MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  60% 241M/405M [00:12<00:05, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  62% 252M/405M [00:13<00:05, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  65% 262M/405M [00:13<00:05, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  67% 273M/405M [00:13<00:04, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  70% 283M/405M [00:14<00:04, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  73% 294M/405M [00:14<00:03, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  75% 304M/405M [00:15<00:03, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  78% 315M/405M [00:15<00:03, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  80% 325M/405M [00:15<00:02, 27.8MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  83% 336M/405M [00:16<00:02, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  85% 346M/405M [00:16<00:02, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  88% 357M/405M [00:16<00:01, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  91% 367M/405M [00:17<00:01, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  93% 377M/405M [00:17<00:00, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  96% 388M/405M [00:18<00:00, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  98% 398M/405M [00:18<00:00, 27.9MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin: 100% 405M/405M [00:18<00:00, 21.7MB/s]\n",
            "Downloading shards:  42% 14/33 [00:48<02:19,  7.34s/it]\n",
            "Downloading (…)l-00015-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:   3% 10.5M/405M [00:02<01:20, 4.90MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:   5% 21.0M/405M [00:03<00:54, 7.08MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:   8% 31.5M/405M [00:03<00:40, 9.31MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  10% 41.9M/405M [00:04<00:30, 11.8MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  13% 52.4M/405M [00:04<00:24, 14.4MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  16% 62.9M/405M [00:05<00:19, 17.2MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  18% 73.4M/405M [00:05<00:16, 19.6MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  21% 83.9M/405M [00:05<00:14, 21.6MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  23% 94.4M/405M [00:06<00:13, 23.2MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  26% 105M/405M [00:06<00:12, 24.4MB/s] \u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  28% 115M/405M [00:07<00:11, 25.3MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  31% 126M/405M [00:07<00:10, 26.0MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  34% 136M/405M [00:07<00:10, 26.5MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  36% 147M/405M [00:08<00:09, 26.8MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  39% 157M/405M [00:08<00:09, 27.1MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  41% 168M/405M [00:09<00:08, 27.3MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  44% 178M/405M [00:09<00:08, 27.4MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  47% 189M/405M [00:09<00:07, 27.5MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  49% 199M/405M [00:10<00:07, 27.5MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  52% 210M/405M [00:10<00:07, 27.6MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  54% 220M/405M [00:10<00:06, 27.6MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  57% 231M/405M [00:11<00:06, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  60% 241M/405M [00:11<00:05, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  62% 252M/405M [00:12<00:05, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  65% 262M/405M [00:12<00:05, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  67% 273M/405M [00:12<00:04, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  70% 283M/405M [00:13<00:04, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  73% 294M/405M [00:13<00:04, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  75% 304M/405M [00:13<00:03, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  78% 315M/405M [00:14<00:03, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  80% 325M/405M [00:14<00:02, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  83% 336M/405M [00:15<00:02, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  85% 346M/405M [00:15<00:02, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  88% 357M/405M [00:15<00:01, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  91% 367M/405M [00:16<00:01, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  93% 377M/405M [00:16<00:00, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  96% 388M/405M [00:16<00:00, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  98% 398M/405M [00:17<00:00, 27.7MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin: 100% 405M/405M [00:17<00:00, 23.0MB/s]\n",
            "Downloading shards:  45% 15/33 [01:06<03:10, 10.56s/it]\n",
            "Downloading (…)l-00016-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 91.8MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 171MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  28% 115M/405M [00:00<00:01, 191MB/s] \u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  36% 147M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  41% 168M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  47% 189M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  52% 210M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  57% 231M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  62% 252M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  67% 273M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  73% 294M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  78% 315M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  83% 336M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  88% 357M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  93% 377M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
            "Downloading shards:  48% 16/33 [01:08<02:16,  8.06s/it]\n",
            "Downloading (…)l-00017-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 90.4MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 143MB/s] \u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 183MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  28% 115M/405M [00:00<00:01, 189MB/s] \u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  34% 136M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  39% 157M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  44% 178M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  49% 199M/405M [00:01<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  60% 241M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  65% 262M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  70% 283M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  75% 304M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  85% 346M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  91% 367M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
            "Downloading shards:  52% 17/33 [01:10<01:40,  6.30s/it]\n",
            "Downloading (…)l-00018-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 89.0MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 170MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 183MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  28% 115M/405M [00:00<00:01, 194MB/s] \u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  34% 136M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  39% 157M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  44% 178M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  60% 241M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  65% 262M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  75% 304M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  80% 325M/405M [00:01<00:00, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  85% 346M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
            "Downloading shards:  55% 18/33 [01:12<01:16,  5.09s/it]\n",
            "Downloading (…)l-00019-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 85.9MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 142MB/s] \u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 167MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  28% 115M/405M [00:00<00:01, 185MB/s] \u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  34% 136M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  39% 157M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  44% 178M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  49% 199M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  54% 220M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  60% 241M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  70% 283M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  75% 304M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  80% 325M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  85% 346M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin: 100% 405M/405M [00:02<00:00, 189MB/s]\n",
            "Downloading shards:  58% 19/33 [01:15<00:59,  4.24s/it]\n",
            "Downloading (…)l-00020-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 85.8MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  28% 115M/405M [00:00<00:01, 191MB/s] \u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  34% 136M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  39% 157M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  44% 178M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  49% 199M/405M [00:01<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  54% 220M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  65% 262M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  75% 304M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  88% 357M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  93% 377M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
            "Downloading shards:  61% 20/33 [01:17<00:47,  3.64s/it]\n",
            "Downloading (…)l-00021-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 84.8MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 141MB/s] \u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 168MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  28% 115M/405M [00:00<00:01, 192MB/s] \u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  34% 136M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  39% 157M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  44% 178M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  60% 241M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  65% 262M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  70% 283M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  75% 304M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  85% 346M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  91% 367M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
            "Downloading shards:  64% 21/33 [01:19<00:38,  3.21s/it]\n",
            "Downloading (…)l-00022-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 89.8MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 147MB/s] \u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  28% 115M/405M [00:00<00:01, 194MB/s] \u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  34% 136M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  39% 157M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  44% 178M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  49% 199M/405M [00:01<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  54% 220M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  80% 325M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  85% 346M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin: 100% 405M/405M [00:02<00:00, 193MB/s]\n",
            "Downloading shards:  67% 22/33 [01:21<00:32,  2.92s/it]\n",
            "Downloading (…)l-00023-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.9MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  34% 136M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  39% 157M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  54% 220M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  60% 241M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  65% 262M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  70% 283M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  75% 304M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  80% 325M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  91% 367M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
            "Downloading shards:  70% 23/33 [01:23<00:27,  2.70s/it]\n",
            "Downloading (…)l-00024-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.8MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  21% 83.9M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  28% 115M/405M [00:00<00:01, 200MB/s] \u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  34% 136M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  41% 168M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  49% 199M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  57% 231M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  62% 252M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  67% 273M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  73% 294M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  80% 325M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  85% 346M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  93% 377M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin: 100% 405M/405M [00:02<00:00, 199MB/s]\n",
            "Downloading shards:  73% 24/33 [01:26<00:22,  2.54s/it]\n",
            "Downloading (…)l-00025-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.7MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  34% 136M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  44% 178M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  49% 199M/405M [00:01<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  54% 220M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  60% 241M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  65% 262M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  70% 283M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  75% 304M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  80% 325M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  85% 346M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  91% 367M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin: 100% 405M/405M [00:02<00:00, 198MB/s]\n",
            "Downloading shards:  76% 25/33 [01:28<00:19,  2.43s/it]\n",
            "Downloading (…)l-00026-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 88.3MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 148MB/s] \u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  34% 136M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  39% 157M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  44% 178M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  49% 199M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  54% 220M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  60% 241M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  65% 262M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  78% 315M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  83% 336M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  88% 357M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  93% 377M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
            "Downloading shards:  79% 26/33 [01:30<00:16,  2.37s/it]\n",
            "Downloading (…)l-00027-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.8MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  31% 126M/405M [00:00<00:01, 203MB/s] \u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  36% 147M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  41% 168M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  47% 189M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  52% 210M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  57% 231M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  62% 252M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  67% 273M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  73% 294M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  78% 315M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  83% 336M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  88% 357M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  93% 377M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin: 100% 405M/405M [00:02<00:00, 198MB/s]\n",
            "Downloading shards:  82% 27/33 [01:32<00:13,  2.31s/it]\n",
            "Downloading (…)l-00028-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.9MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 154MB/s] \u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  39% 157M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  44% 178M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  49% 199M/405M [00:01<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  54% 220M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  65% 262M/405M [00:01<00:00, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  70% 283M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  75% 304M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  80% 325M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  85% 346M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin: 100% 405M/405M [00:02<00:00, 193MB/s]\n",
            "Downloading shards:  85% 28/33 [01:34<00:11,  2.29s/it]\n",
            "Downloading (…)l-00029-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 91.7MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  28% 115M/405M [00:00<00:01, 191MB/s] \u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  34% 136M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  39% 157M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  44% 178M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  49% 199M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  54% 220M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  65% 262M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  70% 283M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  75% 304M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  85% 346M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  91% 367M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
            "Downloading shards:  88% 29/33 [01:37<00:09,  2.28s/it]\n",
            "Downloading (…)l-00030-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 89.7MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  28% 115M/405M [00:00<00:01, 197MB/s] \u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  44% 178M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  49% 199M/405M [00:01<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  54% 220M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  60% 241M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  65% 262M/405M [00:01<00:00, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  70% 283M/405M [00:01<00:00, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  75% 304M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  80% 325M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  85% 346M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
            "Downloading shards:  91% 30/33 [01:39<00:06,  2.27s/it]\n",
            "Downloading (…)l-00031-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 70.4MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 82.7MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:   8% 31.5M/405M [00:00<00:04, 84.8MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  10% 41.9M/405M [00:00<00:04, 87.9MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  13% 52.4M/405M [00:00<00:03, 89.8MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 90.0MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  18% 73.4M/405M [00:00<00:03, 86.3MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 87.6MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  23% 94.4M/405M [00:01<00:03, 78.5MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  26% 105M/405M [00:01<00:03, 80.1MB/s] \u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  28% 115M/405M [00:01<00:03, 84.6MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  31% 126M/405M [00:01<00:03, 82.3MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  34% 136M/405M [00:01<00:03, 85.8MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  36% 147M/405M [00:01<00:02, 88.7MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  39% 157M/405M [00:01<00:02, 87.0MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  41% 168M/405M [00:01<00:02, 87.9MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  44% 178M/405M [00:02<00:02, 90.2MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  47% 189M/405M [00:02<00:02, 90.6MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  49% 199M/405M [00:02<00:02, 89.6MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  52% 210M/405M [00:02<00:02, 92.6MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  54% 220M/405M [00:02<00:02, 88.2MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  57% 231M/405M [00:02<00:01, 90.8MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  60% 241M/405M [00:02<00:01, 91.7MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  62% 252M/405M [00:02<00:01, 91.5MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  65% 262M/405M [00:02<00:01, 91.7MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  67% 273M/405M [00:03<00:01, 91.4MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  70% 283M/405M [00:03<00:01, 93.7MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  73% 294M/405M [00:03<00:01, 93.9MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  75% 304M/405M [00:03<00:01, 94.9MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  78% 315M/405M [00:03<00:00, 92.4MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  80% 325M/405M [00:03<00:00, 91.3MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  83% 336M/405M [00:03<00:00, 91.2MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  85% 346M/405M [00:03<00:00, 89.0MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  88% 357M/405M [00:04<00:00, 91.1MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  91% 367M/405M [00:04<00:00, 92.1MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  93% 377M/405M [00:04<00:00, 93.2MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  96% 388M/405M [00:04<00:00, 93.6MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin: 100% 405M/405M [00:04<00:00, 89.2MB/s]\n",
            "Downloading shards:  94% 31/33 [01:44<00:06,  3.01s/it]\n",
            "Downloading (…)l-00032-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 72.1MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 84.6MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:   8% 31.5M/405M [00:00<00:04, 90.3MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  10% 41.9M/405M [00:00<00:03, 92.3MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  13% 52.4M/405M [00:00<00:03, 92.0MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 93.0MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  18% 73.4M/405M [00:00<00:03, 93.6MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 94.5MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  23% 94.4M/405M [00:01<00:03, 94.9MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  26% 105M/405M [00:01<00:03, 96.0MB/s] \u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  28% 115M/405M [00:01<00:03, 96.2MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  31% 126M/405M [00:01<00:02, 94.7MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  34% 136M/405M [00:01<00:03, 88.4MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  36% 147M/405M [00:01<00:02, 88.4MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  39% 157M/405M [00:01<00:02, 90.1MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  41% 168M/405M [00:01<00:02, 91.8MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  44% 178M/405M [00:01<00:02, 92.5MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  47% 189M/405M [00:02<00:02, 91.0MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  49% 199M/405M [00:02<00:02, 91.9MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  52% 210M/405M [00:02<00:02, 92.8MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  54% 220M/405M [00:02<00:02, 89.9MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  57% 231M/405M [00:02<00:01, 90.4MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  60% 241M/405M [00:02<00:01, 90.8MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  62% 252M/405M [00:02<00:01, 90.1MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  65% 262M/405M [00:02<00:01, 91.8MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  67% 273M/405M [00:02<00:01, 93.0MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  70% 283M/405M [00:03<00:01, 93.2MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  73% 294M/405M [00:03<00:01, 94.4MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  75% 304M/405M [00:03<00:01, 93.9MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  78% 315M/405M [00:03<00:00, 92.3MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  80% 325M/405M [00:03<00:00, 93.9MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  83% 336M/405M [00:03<00:00, 91.7MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  85% 346M/405M [00:03<00:00, 89.6MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  88% 357M/405M [00:03<00:00, 92.0MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  91% 367M/405M [00:03<00:00, 92.7MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  93% 377M/405M [00:04<00:00, 93.7MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  96% 388M/405M [00:04<00:00, 94.6MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin: 100% 405M/405M [00:04<00:00, 92.0MB/s]\n",
            "Downloading shards:  97% 32/33 [01:48<00:03,  3.47s/it]\n",
            "Downloading (…)l-00033-of-00033.bin:   0% 0.00/524M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:   2% 10.5M/524M [00:01<00:50, 10.1MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:   4% 21.0M/524M [00:01<00:23, 21.2MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:   8% 41.9M/524M [00:01<00:10, 44.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  10% 52.4M/524M [00:01<00:08, 53.9MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  12% 62.9M/524M [00:01<00:07, 62.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  14% 73.4M/524M [00:01<00:06, 69.9MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  16% 83.9M/524M [00:01<00:05, 76.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  18% 94.4M/524M [00:01<00:05, 78.2MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  20% 105M/524M [00:01<00:05, 82.6MB/s] \u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  22% 115M/524M [00:02<00:04, 85.7MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  24% 126M/524M [00:02<00:04, 88.5MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  26% 136M/524M [00:02<00:04, 90.9MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  28% 147M/524M [00:02<00:04, 93.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  30% 157M/524M [00:02<00:03, 94.7MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  32% 168M/524M [00:02<00:03, 95.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  34% 178M/524M [00:02<00:03, 95.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  36% 189M/524M [00:02<00:03, 95.5MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  38% 199M/524M [00:02<00:03, 96.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  40% 210M/524M [00:03<00:03, 95.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  42% 220M/524M [00:03<00:03, 96.1MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  44% 231M/524M [00:03<00:03, 96.1MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  46% 241M/524M [00:03<00:02, 96.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  48% 252M/524M [00:03<00:02, 96.7MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  50% 262M/524M [00:03<00:02, 92.2MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  52% 273M/524M [00:03<00:02, 92.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  54% 283M/524M [00:03<00:02, 93.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  56% 294M/524M [00:03<00:02, 94.2MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  58% 304M/524M [00:04<00:02, 93.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  60% 315M/524M [00:04<00:02, 92.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  62% 325M/524M [00:04<00:02, 92.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  64% 336M/524M [00:04<00:02, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  66% 346M/524M [00:04<00:01, 94.2MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  68% 357M/524M [00:04<00:01, 94.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  70% 367M/524M [00:04<00:01, 93.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  72% 377M/524M [00:04<00:01, 94.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  74% 388M/524M [00:04<00:01, 92.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  76% 398M/524M [00:05<00:01, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  78% 409M/524M [00:05<00:01, 92.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  80% 419M/524M [00:05<00:01, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  82% 430M/524M [00:05<00:00, 94.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  84% 440M/524M [00:05<00:00, 94.7MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  86% 451M/524M [00:05<00:00, 94.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  88% 461M/524M [00:05<00:00, 95.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  90% 472M/524M [00:05<00:00, 94.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  92% 482M/524M [00:05<00:00, 94.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  94% 493M/524M [00:06<00:00, 87.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  96% 503M/524M [00:06<00:00, 89.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  98% 514M/524M [00:06<00:00, 91.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin: 100% 524M/524M [00:06<00:00, 81.3MB/s]\n",
            "Downloading shards: 100% 33/33 [01:55<00:00,  3.49s/it]\n",
            "Loading checkpoint shards: 100% 33/33 [00:14<00:00,  2.30it/s]\n",
            "Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 18.4kB/s]\n",
            "Extended vocabulary size: 49954\n",
            "Loading LoRA for 7B model\n",
            "Downloading (…)/adapter_config.json: 100% 472/472 [00:00<00:00, 166kB/s]\n",
            "Downloading adapter_model.bin: 100% 858M/858M [00:08<00:00, 103MB/s]\n",
            "Peft version: 0.2.0\n",
            "Merging model\n",
            "Saving shard 1 of 1 into alpaca-combined/consolidated.00.pth\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 量化模型\n",
        "接下来我们使用[llama.cpp](https://github.com/ggerganov/llama.cpp)工具对上一步生成的全量版本权重进行转换，生成4-bit量化模型。\n",
        "\n",
        "### 编译工具\n",
        "\n",
        "首先对llama.cpp工具进行编译。"
      ],
      "metadata": {
        "id": "ueexcKo-Q_EW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && make"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_GbjsT2wRRCR",
        "outputId": "8da3382c-6bff-4030-905b-bb4f622766d7"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "I llama.cpp build info: \n",
            "I UNAME_S:  Linux\n",
            "I UNAME_P:  x86_64\n",
            "I UNAME_M:  x86_64\n",
            "I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native\n",
            "I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native\n",
            "I LDFLAGS:  \n",
            "I CC:       cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
            "I CXX:      g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
            "\n",
            "cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native   -c ggml.c -o ggml.o\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o\n",
            "In file included from \u001b[01m\u001b[Kllama.cpp:6\u001b[m\u001b[K:\n",
            "\u001b[01m\u001b[Kllama_util.h:60:2:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kextra ‘\u001b[01m\u001b[K;\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wpedantic\u001b[m\u001b[K]\n",
            "   60 | }\u001b[01;35m\u001b[K;\u001b[m\u001b[K\n",
            "      |  \u001b[01;35m\u001b[K^\u001b[m\u001b[K\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/main/main.cpp ggml.o llama.o common.o -o main \n",
            "\n",
            "====  Run ./main -h for help.  ====\n",
            "\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize/quantize.cpp ggml.o llama.o -o quantize \n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity \n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding \n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### 模型转换为ggml格式（FP16）\n",
        "\n",
        "这一步，我们将模型转换为ggml格式（FP16）。\n",
        "- 在这之前需要把`alpaca-combined`目录挪个位置，把模型文件放到`llama.cpp/zh-models/7B`下，把`tokenizer.model`放到`llama.cpp/zh-models`\n",
        "- tokenizer在哪里？\n",
        "    - `alpaca-combined`目录下有\n",
        "    - 或者从以下网址下载：https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model （注意，Alpaca和LLaMA的`tokenizer.model`不能混用！）\n",
        "\n",
        "💡 转换13B模型提示：\n",
        "- tokenizer可以直接用7B的，13B和7B的相同\n",
        "- Alpaca和LLaMA的`tokenizer.model`不能混用！\n",
        "- 以下看到7B字样的都是文件夹名，与转换过程没有关系了，改不改都行"
      ],
      "metadata": {
        "id": "gw2xpYC0RcQC"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && mkdir zh-models && mv ../alpaca-combined zh-models/7B\n",
        "!mv llama.cpp/zh-models/7B/tokenizer.model llama.cpp/zh-models/\n",
        "!ls llama.cpp/zh-models/"
      ],
      "metadata": {
        "id": "5KgnFVStRjio",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "09ba7058-e2fb-4ae1-8539-62228df6ea09"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "7B  tokenizer.model\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && python convert.py zh-models/7B/"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NUHeoTMQS1AQ",
        "outputId": "356f9e70-d05d-42d3-ed8c-fc052e11a855"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Loading model file zh-models/7B/consolidated.00.pth\n",
            "Loading vocab file zh-models/tokenizer.model\n",
            "Writing vocab...\n",
            "[1/291] Writing tensor tok_embeddings.weight, size 49954 x 4096...\n",
            "[2/291] Writing tensor norm.weight, size 4096...\n",
            "[3/291] Writing tensor output.weight, size 49954 x 4096...\n",
            "[4/291] Writing tensor layers.0.attention.wq.weight, size 4096 x 4096...\n",
            "[5/291] Writing tensor layers.0.attention.wk.weight, size 4096 x 4096...\n",
            "[6/291] Writing tensor layers.0.attention.wv.weight, size 4096 x 4096...\n",
            "[7/291] Writing tensor layers.0.attention.wo.weight, size 4096 x 4096...\n",
            "[8/291] Writing tensor layers.0.attention_norm.weight, size 4096...\n",
            "[9/291] Writing tensor layers.0.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[10/291] Writing tensor layers.0.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[11/291] Writing tensor layers.0.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[12/291] Writing tensor layers.0.ffn_norm.weight, size 4096...\n",
            "[13/291] Writing tensor layers.1.attention.wq.weight, size 4096 x 4096...\n",
            "[14/291] Writing tensor layers.1.attention.wk.weight, size 4096 x 4096...\n",
            "[15/291] Writing tensor layers.1.attention.wv.weight, size 4096 x 4096...\n",
            "[16/291] Writing tensor layers.1.attention.wo.weight, size 4096 x 4096...\n",
            "[17/291] Writing tensor layers.1.attention_norm.weight, size 4096...\n",
            "[18/291] Writing tensor layers.1.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[19/291] Writing tensor layers.1.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[20/291] Writing tensor layers.1.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[21/291] Writing tensor layers.1.ffn_norm.weight, size 4096...\n",
            "[22/291] Writing tensor layers.2.attention.wq.weight, size 4096 x 4096...\n",
            "[23/291] Writing tensor layers.2.attention.wk.weight, size 4096 x 4096...\n",
            "[24/291] Writing tensor layers.2.attention.wv.weight, size 4096 x 4096...\n",
            "[25/291] Writing tensor layers.2.attention.wo.weight, size 4096 x 4096...\n",
            "[26/291] Writing tensor layers.2.attention_norm.weight, size 4096...\n",
            "[27/291] Writing tensor layers.2.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[28/291] Writing tensor layers.2.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[29/291] Writing tensor layers.2.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[30/291] Writing tensor layers.2.ffn_norm.weight, size 4096...\n",
            "[31/291] Writing tensor layers.3.attention.wq.weight, size 4096 x 4096...\n",
            "[32/291] Writing tensor layers.3.attention.wk.weight, size 4096 x 4096...\n",
            "[33/291] Writing tensor layers.3.attention.wv.weight, size 4096 x 4096...\n",
            "[34/291] Writing tensor layers.3.attention.wo.weight, size 4096 x 4096...\n",
            "[35/291] Writing tensor layers.3.attention_norm.weight, size 4096...\n",
            "[36/291] Writing tensor layers.3.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[37/291] Writing tensor layers.3.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[38/291] Writing tensor layers.3.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[39/291] Writing tensor layers.3.ffn_norm.weight, size 4096...\n",
            "[40/291] Writing tensor layers.4.attention.wq.weight, size 4096 x 4096...\n",
            "[41/291] Writing tensor layers.4.attention.wk.weight, size 4096 x 4096...\n",
            "[42/291] Writing tensor layers.4.attention.wv.weight, size 4096 x 4096...\n",
            "[43/291] Writing tensor layers.4.attention.wo.weight, size 4096 x 4096...\n",
            "[44/291] Writing tensor layers.4.attention_norm.weight, size 4096...\n",
            "[45/291] Writing tensor layers.4.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[46/291] Writing tensor layers.4.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[47/291] Writing tensor layers.4.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[48/291] Writing tensor layers.4.ffn_norm.weight, size 4096...\n",
            "[49/291] Writing tensor layers.5.attention.wq.weight, size 4096 x 4096...\n",
            "[50/291] Writing tensor layers.5.attention.wk.weight, size 4096 x 4096...\n",
            "[51/291] Writing tensor layers.5.attention.wv.weight, size 4096 x 4096...\n",
            "[52/291] Writing tensor layers.5.attention.wo.weight, size 4096 x 4096...\n",
            "[53/291] Writing tensor layers.5.attention_norm.weight, size 4096...\n",
            "[54/291] Writing tensor layers.5.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[55/291] Writing tensor layers.5.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[56/291] Writing tensor layers.5.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[57/291] Writing tensor layers.5.ffn_norm.weight, size 4096...\n",
            "[58/291] Writing tensor layers.6.attention.wq.weight, size 4096 x 4096...\n",
            "[59/291] Writing tensor layers.6.attention.wk.weight, size 4096 x 4096...\n",
            "[60/291] Writing tensor layers.6.attention.wv.weight, size 4096 x 4096...\n",
            "[61/291] Writing tensor layers.6.attention.wo.weight, size 4096 x 4096...\n",
            "[62/291] Writing tensor layers.6.attention_norm.weight, size 4096...\n",
            "[63/291] Writing tensor layers.6.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[64/291] Writing tensor layers.6.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[65/291] Writing tensor layers.6.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[66/291] Writing tensor layers.6.ffn_norm.weight, size 4096...\n",
            "[67/291] Writing tensor layers.7.attention.wq.weight, size 4096 x 4096...\n",
            "[68/291] Writing tensor layers.7.attention.wk.weight, size 4096 x 4096...\n",
            "[69/291] Writing tensor layers.7.attention.wv.weight, size 4096 x 4096...\n",
            "[70/291] Writing tensor layers.7.attention.wo.weight, size 4096 x 4096...\n",
            "[71/291] Writing tensor layers.7.attention_norm.weight, size 4096...\n",
            "[72/291] Writing tensor layers.7.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[73/291] Writing tensor layers.7.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[74/291] Writing tensor layers.7.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[75/291] Writing tensor layers.7.ffn_norm.weight, size 4096...\n",
            "[76/291] Writing tensor layers.8.attention.wq.weight, size 4096 x 4096...\n",
            "[77/291] Writing tensor layers.8.attention.wk.weight, size 4096 x 4096...\n",
            "[78/291] Writing tensor layers.8.attention.wv.weight, size 4096 x 4096...\n",
            "[79/291] Writing tensor layers.8.attention.wo.weight, size 4096 x 4096...\n",
            "[80/291] Writing tensor layers.8.attention_norm.weight, size 4096...\n",
            "[81/291] Writing tensor layers.8.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[82/291] Writing tensor layers.8.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[83/291] Writing tensor layers.8.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[84/291] Writing tensor layers.8.ffn_norm.weight, size 4096...\n",
            "[85/291] Writing tensor layers.9.attention.wq.weight, size 4096 x 4096...\n",
            "[86/291] Writing tensor layers.9.attention.wk.weight, size 4096 x 4096...\n",
            "[87/291] Writing tensor layers.9.attention.wv.weight, size 4096 x 4096...\n",
            "[88/291] Writing tensor layers.9.attention.wo.weight, size 4096 x 4096...\n",
            "[89/291] Writing tensor layers.9.attention_norm.weight, size 4096...\n",
            "[90/291] Writing tensor layers.9.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[91/291] Writing tensor layers.9.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[92/291] Writing tensor layers.9.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[93/291] Writing tensor layers.9.ffn_norm.weight, size 4096...\n",
            "[94/291] Writing tensor layers.10.attention.wq.weight, size 4096 x 4096...\n",
            "[95/291] Writing tensor layers.10.attention.wk.weight, size 4096 x 4096...\n",
            "[96/291] Writing tensor layers.10.attention.wv.weight, size 4096 x 4096...\n",
            "[97/291] Writing tensor layers.10.attention.wo.weight, size 4096 x 4096...\n",
            "[98/291] Writing tensor layers.10.attention_norm.weight, size 4096...\n",
            "[99/291] Writing tensor layers.10.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[100/291] Writing tensor layers.10.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[101/291] Writing tensor layers.10.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[102/291] Writing tensor layers.10.ffn_norm.weight, size 4096...\n",
            "[103/291] Writing tensor layers.11.attention.wq.weight, size 4096 x 4096...\n",
            "[104/291] Writing tensor layers.11.attention.wk.weight, size 4096 x 4096...\n",
            "[105/291] Writing tensor layers.11.attention.wv.weight, size 4096 x 4096...\n",
            "[106/291] Writing tensor layers.11.attention.wo.weight, size 4096 x 4096...\n",
            "[107/291] Writing tensor layers.11.attention_norm.weight, size 4096...\n",
            "[108/291] Writing tensor layers.11.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[109/291] Writing tensor layers.11.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[110/291] Writing tensor layers.11.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[111/291] Writing tensor layers.11.ffn_norm.weight, size 4096...\n",
            "[112/291] Writing tensor layers.12.attention.wq.weight, size 4096 x 4096...\n",
            "[113/291] Writing tensor layers.12.attention.wk.weight, size 4096 x 4096...\n",
            "[114/291] Writing tensor layers.12.attention.wv.weight, size 4096 x 4096...\n",
            "[115/291] Writing tensor layers.12.attention.wo.weight, size 4096 x 4096...\n",
            "[116/291] Writing tensor layers.12.attention_norm.weight, size 4096...\n",
            "[117/291] Writing tensor layers.12.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[118/291] Writing tensor layers.12.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[119/291] Writing tensor layers.12.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[120/291] Writing tensor layers.12.ffn_norm.weight, size 4096...\n",
            "[121/291] Writing tensor layers.13.attention.wq.weight, size 4096 x 4096...\n",
            "[122/291] Writing tensor layers.13.attention.wk.weight, size 4096 x 4096...\n",
            "[123/291] Writing tensor layers.13.attention.wv.weight, size 4096 x 4096...\n",
            "[124/291] Writing tensor layers.13.attention.wo.weight, size 4096 x 4096...\n",
            "[125/291] Writing tensor layers.13.attention_norm.weight, size 4096...\n",
            "[126/291] Writing tensor layers.13.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[127/291] Writing tensor layers.13.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[128/291] Writing tensor layers.13.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[129/291] Writing tensor layers.13.ffn_norm.weight, size 4096...\n",
            "[130/291] Writing tensor layers.14.attention.wq.weight, size 4096 x 4096...\n",
            "[131/291] Writing tensor layers.14.attention.wk.weight, size 4096 x 4096...\n",
            "[132/291] Writing tensor layers.14.attention.wv.weight, size 4096 x 4096...\n",
            "[133/291] Writing tensor layers.14.attention.wo.weight, size 4096 x 4096...\n",
            "[134/291] Writing tensor layers.14.attention_norm.weight, size 4096...\n",
            "[135/291] Writing tensor layers.14.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[136/291] Writing tensor layers.14.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[137/291] Writing tensor layers.14.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[138/291] Writing tensor layers.14.ffn_norm.weight, size 4096...\n",
            "[139/291] Writing tensor layers.15.attention.wq.weight, size 4096 x 4096...\n",
            "[140/291] Writing tensor layers.15.attention.wk.weight, size 4096 x 4096...\n",
            "[141/291] Writing tensor layers.15.attention.wv.weight, size 4096 x 4096...\n",
            "[142/291] Writing tensor layers.15.attention.wo.weight, size 4096 x 4096...\n",
            "[143/291] Writing tensor layers.15.attention_norm.weight, size 4096...\n",
            "[144/291] Writing tensor layers.15.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[145/291] Writing tensor layers.15.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[146/291] Writing tensor layers.15.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[147/291] Writing tensor layers.15.ffn_norm.weight, size 4096...\n",
            "[148/291] Writing tensor layers.16.attention.wq.weight, size 4096 x 4096...\n",
            "[149/291] Writing tensor layers.16.attention.wk.weight, size 4096 x 4096...\n",
            "[150/291] Writing tensor layers.16.attention.wv.weight, size 4096 x 4096...\n",
            "[151/291] Writing tensor layers.16.attention.wo.weight, size 4096 x 4096...\n",
            "[152/291] Writing tensor layers.16.attention_norm.weight, size 4096...\n",
            "[153/291] Writing tensor layers.16.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[154/291] Writing tensor layers.16.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[155/291] Writing tensor layers.16.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[156/291] Writing tensor layers.16.ffn_norm.weight, size 4096...\n",
            "[157/291] Writing tensor layers.17.attention.wq.weight, size 4096 x 4096...\n",
            "[158/291] Writing tensor layers.17.attention.wk.weight, size 4096 x 4096...\n",
            "[159/291] Writing tensor layers.17.attention.wv.weight, size 4096 x 4096...\n",
            "[160/291] Writing tensor layers.17.attention.wo.weight, size 4096 x 4096...\n",
            "[161/291] Writing tensor layers.17.attention_norm.weight, size 4096...\n",
            "[162/291] Writing tensor layers.17.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[163/291] Writing tensor layers.17.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[164/291] Writing tensor layers.17.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[165/291] Writing tensor layers.17.ffn_norm.weight, size 4096...\n",
            "[166/291] Writing tensor layers.18.attention.wq.weight, size 4096 x 4096...\n",
            "[167/291] Writing tensor layers.18.attention.wk.weight, size 4096 x 4096...\n",
            "[168/291] Writing tensor layers.18.attention.wv.weight, size 4096 x 4096...\n",
            "[169/291] Writing tensor layers.18.attention.wo.weight, size 4096 x 4096...\n",
            "[170/291] Writing tensor layers.18.attention_norm.weight, size 4096...\n",
            "[171/291] Writing tensor layers.18.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[172/291] Writing tensor layers.18.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[173/291] Writing tensor layers.18.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[174/291] Writing tensor layers.18.ffn_norm.weight, size 4096...\n",
            "[175/291] Writing tensor layers.19.attention.wq.weight, size 4096 x 4096...\n",
            "[176/291] Writing tensor layers.19.attention.wk.weight, size 4096 x 4096...\n",
            "[177/291] Writing tensor layers.19.attention.wv.weight, size 4096 x 4096...\n",
            "[178/291] Writing tensor layers.19.attention.wo.weight, size 4096 x 4096...\n",
            "[179/291] Writing tensor layers.19.attention_norm.weight, size 4096...\n",
            "[180/291] Writing tensor layers.19.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[181/291] Writing tensor layers.19.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[182/291] Writing tensor layers.19.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[183/291] Writing tensor layers.19.ffn_norm.weight, size 4096...\n",
            "[184/291] Writing tensor layers.20.attention.wq.weight, size 4096 x 4096...\n",
            "[185/291] Writing tensor layers.20.attention.wk.weight, size 4096 x 4096...\n",
            "[186/291] Writing tensor layers.20.attention.wv.weight, size 4096 x 4096...\n",
            "[187/291] Writing tensor layers.20.attention.wo.weight, size 4096 x 4096...\n",
            "[188/291] Writing tensor layers.20.attention_norm.weight, size 4096...\n",
            "[189/291] Writing tensor layers.20.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[190/291] Writing tensor layers.20.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[191/291] Writing tensor layers.20.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[192/291] Writing tensor layers.20.ffn_norm.weight, size 4096...\n",
            "[193/291] Writing tensor layers.21.attention.wq.weight, size 4096 x 4096...\n",
            "[194/291] Writing tensor layers.21.attention.wk.weight, size 4096 x 4096...\n",
            "[195/291] Writing tensor layers.21.attention.wv.weight, size 4096 x 4096...\n",
            "[196/291] Writing tensor layers.21.attention.wo.weight, size 4096 x 4096...\n",
            "[197/291] Writing tensor layers.21.attention_norm.weight, size 4096...\n",
            "[198/291] Writing tensor layers.21.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[199/291] Writing tensor layers.21.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[200/291] Writing tensor layers.21.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[201/291] Writing tensor layers.21.ffn_norm.weight, size 4096...\n",
            "[202/291] Writing tensor layers.22.attention.wq.weight, size 4096 x 4096...\n",
            "[203/291] Writing tensor layers.22.attention.wk.weight, size 4096 x 4096...\n",
            "[204/291] Writing tensor layers.22.attention.wv.weight, size 4096 x 4096...\n",
            "[205/291] Writing tensor layers.22.attention.wo.weight, size 4096 x 4096...\n",
            "[206/291] Writing tensor layers.22.attention_norm.weight, size 4096...\n",
            "[207/291] Writing tensor layers.22.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[208/291] Writing tensor layers.22.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[209/291] Writing tensor layers.22.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[210/291] Writing tensor layers.22.ffn_norm.weight, size 4096...\n",
            "[211/291] Writing tensor layers.23.attention.wq.weight, size 4096 x 4096...\n",
            "[212/291] Writing tensor layers.23.attention.wk.weight, size 4096 x 4096...\n",
            "[213/291] Writing tensor layers.23.attention.wv.weight, size 4096 x 4096...\n",
            "[214/291] Writing tensor layers.23.attention.wo.weight, size 4096 x 4096...\n",
            "[215/291] Writing tensor layers.23.attention_norm.weight, size 4096...\n",
            "[216/291] Writing tensor layers.23.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[217/291] Writing tensor layers.23.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[218/291] Writing tensor layers.23.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[219/291] Writing tensor layers.23.ffn_norm.weight, size 4096...\n",
            "[220/291] Writing tensor layers.24.attention.wq.weight, size 4096 x 4096...\n",
            "[221/291] Writing tensor layers.24.attention.wk.weight, size 4096 x 4096...\n",
            "[222/291] Writing tensor layers.24.attention.wv.weight, size 4096 x 4096...\n",
            "[223/291] Writing tensor layers.24.attention.wo.weight, size 4096 x 4096...\n",
            "[224/291] Writing tensor layers.24.attention_norm.weight, size 4096...\n",
            "[225/291] Writing tensor layers.24.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[226/291] Writing tensor layers.24.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[227/291] Writing tensor layers.24.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[228/291] Writing tensor layers.24.ffn_norm.weight, size 4096...\n",
            "[229/291] Writing tensor layers.25.attention.wq.weight, size 4096 x 4096...\n",
            "[230/291] Writing tensor layers.25.attention.wk.weight, size 4096 x 4096...\n",
            "[231/291] Writing tensor layers.25.attention.wv.weight, size 4096 x 4096...\n",
            "[232/291] Writing tensor layers.25.attention.wo.weight, size 4096 x 4096...\n",
            "[233/291] Writing tensor layers.25.attention_norm.weight, size 4096...\n",
            "[234/291] Writing tensor layers.25.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[235/291] Writing tensor layers.25.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[236/291] Writing tensor layers.25.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[237/291] Writing tensor layers.25.ffn_norm.weight, size 4096...\n",
            "[238/291] Writing tensor layers.26.attention.wq.weight, size 4096 x 4096...\n",
            "[239/291] Writing tensor layers.26.attention.wk.weight, size 4096 x 4096...\n",
            "[240/291] Writing tensor layers.26.attention.wv.weight, size 4096 x 4096...\n",
            "[241/291] Writing tensor layers.26.attention.wo.weight, size 4096 x 4096...\n",
            "[242/291] Writing tensor layers.26.attention_norm.weight, size 4096...\n",
            "[243/291] Writing tensor layers.26.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[244/291] Writing tensor layers.26.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[245/291] Writing tensor layers.26.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[246/291] Writing tensor layers.26.ffn_norm.weight, size 4096...\n",
            "[247/291] Writing tensor layers.27.attention.wq.weight, size 4096 x 4096...\n",
            "[248/291] Writing tensor layers.27.attention.wk.weight, size 4096 x 4096...\n",
            "[249/291] Writing tensor layers.27.attention.wv.weight, size 4096 x 4096...\n",
            "[250/291] Writing tensor layers.27.attention.wo.weight, size 4096 x 4096...\n",
            "[251/291] Writing tensor layers.27.attention_norm.weight, size 4096...\n",
            "[252/291] Writing tensor layers.27.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[253/291] Writing tensor layers.27.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[254/291] Writing tensor layers.27.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[255/291] Writing tensor layers.27.ffn_norm.weight, size 4096...\n",
            "[256/291] Writing tensor layers.28.attention.wq.weight, size 4096 x 4096...\n",
            "[257/291] Writing tensor layers.28.attention.wk.weight, size 4096 x 4096...\n",
            "[258/291] Writing tensor layers.28.attention.wv.weight, size 4096 x 4096...\n",
            "[259/291] Writing tensor layers.28.attention.wo.weight, size 4096 x 4096...\n",
            "[260/291] Writing tensor layers.28.attention_norm.weight, size 4096...\n",
            "[261/291] Writing tensor layers.28.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[262/291] Writing tensor layers.28.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[263/291] Writing tensor layers.28.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[264/291] Writing tensor layers.28.ffn_norm.weight, size 4096...\n",
            "[265/291] Writing tensor layers.29.attention.wq.weight, size 4096 x 4096...\n",
            "[266/291] Writing tensor layers.29.attention.wk.weight, size 4096 x 4096...\n",
            "[267/291] Writing tensor layers.29.attention.wv.weight, size 4096 x 4096...\n",
            "[268/291] Writing tensor layers.29.attention.wo.weight, size 4096 x 4096...\n",
            "[269/291] Writing tensor layers.29.attention_norm.weight, size 4096...\n",
            "[270/291] Writing tensor layers.29.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[271/291] Writing tensor layers.29.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[272/291] Writing tensor layers.29.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[273/291] Writing tensor layers.29.ffn_norm.weight, size 4096...\n",
            "[274/291] Writing tensor layers.30.attention.wq.weight, size 4096 x 4096...\n",
            "[275/291] Writing tensor layers.30.attention.wk.weight, size 4096 x 4096...\n",
            "[276/291] Writing tensor layers.30.attention.wv.weight, size 4096 x 4096...\n",
            "[277/291] Writing tensor layers.30.attention.wo.weight, size 4096 x 4096...\n",
            "[278/291] Writing tensor layers.30.attention_norm.weight, size 4096...\n",
            "[279/291] Writing tensor layers.30.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[280/291] Writing tensor layers.30.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[281/291] Writing tensor layers.30.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[282/291] Writing tensor layers.30.ffn_norm.weight, size 4096...\n",
            "[283/291] Writing tensor layers.31.attention.wq.weight, size 4096 x 4096...\n",
            "[284/291] Writing tensor layers.31.attention.wk.weight, size 4096 x 4096...\n",
            "[285/291] Writing tensor layers.31.attention.wv.weight, size 4096 x 4096...\n",
            "[286/291] Writing tensor layers.31.attention.wo.weight, size 4096 x 4096...\n",
            "[287/291] Writing tensor layers.31.attention_norm.weight, size 4096...\n",
            "[288/291] Writing tensor layers.31.feed_forward.w1.weight, size 11008 x 4096...\n",
            "[289/291] Writing tensor layers.31.feed_forward.w2.weight, size 4096 x 11008...\n",
            "[290/291] Writing tensor layers.31.feed_forward.w3.weight, size 11008 x 4096...\n",
            "[291/291] Writing tensor layers.31.ffn_norm.weight, size 4096...\n",
            "Wrote zh-models/7B/ggml-model-f16.bin\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### 将FP16模型量化为4-bit\n",
        "\n",
        "我们进一步将FP16模型转换为4-bit量化模型。"
      ],
      "metadata": {
        "id": "hEZEJAVYCHkc"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && ./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin 2"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2xyais7OUVDI",
        "outputId": "99b4154e-1370-4240-c06b-69ff2f49ee37"
      },
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "llama.cpp: loading model from ./zh-models/7B/ggml-model-f16.bin\n",
            "llama.cpp: saving model to ./zh-models/7B/ggml-model-q4_0.bin\n",
            "[1/291]                tok_embeddings.weight - [4096 x 49954], type =    f16, quantizing .. size =   390.27 MB ->   121.96 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[2/291]                          norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[3/291]                        output.weight - [4096 x 49954], type =    f16, quantizing .. size =   390.27 MB ->   121.96 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[4/291]         layers.0.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.071 0.103 0.137 0.158 0.137 0.103 0.071 0.046 0.028 0.016 0.021 \n",
            "[5/291]         layers.0.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.027 0.045 0.071 0.104 0.138 0.158 0.139 0.104 0.071 0.045 0.027 0.016 0.021 \n",
            "[6/291]         layers.0.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.018 0.032 0.051 0.076 0.103 0.128 0.141 0.128 0.103 0.075 0.051 0.032 0.019 0.022 \n",
            "[7/291]         layers.0.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.072 0.105 0.136 0.151 0.136 0.105 0.072 0.046 0.028 0.016 0.021 \n",
            "[8/291]       layers.0.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[9/291]      layers.0.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[10/291]      layers.0.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[11/291]      layers.0.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[12/291]             layers.0.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[13/291]         layers.1.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.051 0.077 0.104 0.127 0.137 0.127 0.104 0.077 0.051 0.032 0.019 0.022 \n",
            "[14/291]         layers.1.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.018 0.032 0.051 0.076 0.104 0.128 0.138 0.128 0.104 0.077 0.051 0.032 0.018 0.022 \n",
            "[15/291]         layers.1.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.018 0.031 0.051 0.076 0.104 0.129 0.139 0.129 0.104 0.076 0.051 0.031 0.018 0.021 \n",
            "[16/291]         layers.1.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.071 0.104 0.137 0.154 0.137 0.104 0.071 0.046 0.028 0.016 0.021 \n",
            "[17/291]       layers.1.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[18/291]      layers.1.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[19/291]      layers.1.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[20/291]      layers.1.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[21/291]             layers.1.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[22/291]         layers.2.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[23/291]         layers.2.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.051 0.076 0.104 0.127 0.137 0.127 0.104 0.077 0.051 0.032 0.019 0.022 \n",
            "[24/291]         layers.2.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.136 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[25/291]         layers.2.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[26/291]       layers.2.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[27/291]      layers.2.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[28/291]      layers.2.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[29/291]      layers.2.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[30/291]             layers.2.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[31/291]         layers.3.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[32/291]         layers.3.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.136 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[33/291]         layers.3.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[34/291]         layers.3.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[35/291]       layers.3.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[36/291]      layers.3.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[37/291]      layers.3.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[38/291]      layers.3.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[39/291]             layers.3.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[40/291]         layers.4.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[41/291]         layers.4.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[42/291]         layers.4.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[43/291]         layers.4.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[44/291]       layers.4.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[45/291]      layers.4.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[46/291]      layers.4.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[47/291]      layers.4.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[48/291]             layers.4.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[49/291]         layers.5.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[50/291]         layers.5.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[51/291]         layers.5.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[52/291]         layers.5.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[53/291]       layers.5.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[54/291]      layers.5.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[55/291]      layers.5.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[56/291]      layers.5.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[57/291]             layers.5.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[58/291]         layers.6.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[59/291]         layers.6.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[60/291]         layers.6.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[61/291]         layers.6.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[62/291]       layers.6.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[63/291]      layers.6.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[64/291]      layers.6.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[65/291]      layers.6.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[66/291]             layers.6.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[67/291]         layers.7.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[68/291]         layers.7.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[69/291]         layers.7.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[70/291]         layers.7.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[71/291]       layers.7.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[72/291]      layers.7.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[73/291]      layers.7.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[74/291]      layers.7.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[75/291]             layers.7.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[76/291]         layers.8.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[77/291]         layers.8.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[78/291]         layers.8.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[79/291]         layers.8.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[80/291]       layers.8.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[81/291]      layers.8.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[82/291]      layers.8.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[83/291]      layers.8.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[84/291]             layers.8.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[85/291]         layers.9.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[86/291]         layers.9.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[87/291]         layers.9.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[88/291]         layers.9.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[89/291]       layers.9.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[90/291]      layers.9.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[91/291]      layers.9.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[92/291]      layers.9.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[93/291]             layers.9.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[94/291]        layers.10.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[95/291]        layers.10.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[96/291]        layers.10.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[97/291]        layers.10.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[98/291]      layers.10.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[99/291]     layers.10.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[100/291]     layers.10.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[101/291]     layers.10.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[102/291]            layers.10.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[103/291]        layers.11.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[104/291]        layers.11.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[105/291]        layers.11.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[106/291]        layers.11.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[107/291]      layers.11.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[108/291]     layers.11.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[109/291]     layers.11.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[110/291]     layers.11.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[111/291]            layers.11.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[112/291]        layers.12.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[113/291]        layers.12.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[114/291]        layers.12.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[115/291]        layers.12.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[116/291]      layers.12.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[117/291]     layers.12.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[118/291]     layers.12.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[119/291]     layers.12.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[120/291]            layers.12.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[121/291]        layers.13.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[122/291]        layers.13.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[123/291]        layers.13.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[124/291]        layers.13.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[125/291]      layers.13.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[126/291]     layers.13.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[127/291]     layers.13.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[128/291]     layers.13.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[129/291]            layers.13.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[130/291]        layers.14.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[131/291]        layers.14.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[132/291]        layers.14.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[133/291]        layers.14.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[134/291]      layers.14.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[135/291]     layers.14.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[136/291]     layers.14.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[137/291]     layers.14.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[138/291]            layers.14.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[139/291]        layers.15.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[140/291]        layers.15.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[141/291]        layers.15.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[142/291]        layers.15.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[143/291]      layers.15.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[144/291]     layers.15.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[145/291]     layers.15.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[146/291]     layers.15.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[147/291]            layers.15.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[148/291]        layers.16.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[149/291]        layers.16.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[150/291]        layers.16.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[151/291]        layers.16.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[152/291]      layers.16.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[153/291]     layers.16.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[154/291]     layers.16.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[155/291]     layers.16.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[156/291]            layers.16.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[157/291]        layers.17.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[158/291]        layers.17.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[159/291]        layers.17.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[160/291]        layers.17.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[161/291]      layers.17.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[162/291]     layers.17.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[163/291]     layers.17.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[164/291]     layers.17.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[165/291]            layers.17.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[166/291]        layers.18.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[167/291]        layers.18.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[168/291]        layers.18.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[169/291]        layers.18.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[170/291]      layers.18.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[171/291]     layers.18.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[172/291]     layers.18.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[173/291]     layers.18.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[174/291]            layers.18.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[175/291]        layers.19.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[176/291]        layers.19.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[177/291]        layers.19.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[178/291]        layers.19.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[179/291]      layers.19.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[180/291]     layers.19.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[181/291]     layers.19.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[182/291]     layers.19.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[183/291]            layers.19.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[184/291]        layers.20.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[185/291]        layers.20.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[186/291]        layers.20.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[187/291]        layers.20.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[188/291]      layers.20.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[189/291]     layers.20.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[190/291]     layers.20.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[191/291]     layers.20.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[192/291]            layers.20.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[193/291]        layers.21.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[194/291]        layers.21.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[195/291]        layers.21.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[196/291]        layers.21.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[197/291]      layers.21.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[198/291]     layers.21.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[199/291]     layers.21.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[200/291]     layers.21.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[201/291]            layers.21.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[202/291]        layers.22.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[203/291]        layers.22.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[204/291]        layers.22.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[205/291]        layers.22.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[206/291]      layers.22.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[207/291]     layers.22.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[208/291]     layers.22.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[209/291]     layers.22.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[210/291]            layers.22.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[211/291]        layers.23.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[212/291]        layers.23.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[213/291]        layers.23.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[214/291]        layers.23.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[215/291]      layers.23.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[216/291]     layers.23.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[217/291]     layers.23.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[218/291]     layers.23.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[219/291]            layers.23.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[220/291]        layers.24.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[221/291]        layers.24.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[222/291]        layers.24.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[223/291]        layers.24.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[224/291]      layers.24.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[225/291]     layers.24.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[226/291]     layers.24.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[227/291]     layers.24.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[228/291]            layers.24.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[229/291]        layers.25.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[230/291]        layers.25.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[231/291]        layers.25.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[232/291]        layers.25.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[233/291]      layers.25.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[234/291]     layers.25.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[235/291]     layers.25.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[236/291]     layers.25.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[237/291]            layers.25.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[238/291]        layers.26.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[239/291]        layers.26.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[240/291]        layers.26.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[241/291]        layers.26.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[242/291]      layers.26.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[243/291]     layers.26.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[244/291]     layers.26.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[245/291]     layers.26.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[246/291]            layers.26.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[247/291]        layers.27.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[248/291]        layers.27.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[249/291]        layers.27.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[250/291]        layers.27.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[251/291]      layers.27.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[252/291]     layers.27.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[253/291]     layers.27.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[254/291]     layers.27.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[255/291]            layers.27.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[256/291]        layers.28.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[257/291]        layers.28.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[258/291]        layers.28.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[259/291]        layers.28.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[260/291]      layers.28.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[261/291]     layers.28.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[262/291]     layers.28.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[263/291]     layers.28.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[264/291]            layers.28.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[265/291]        layers.29.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[266/291]        layers.29.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[267/291]        layers.29.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[268/291]        layers.29.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[269/291]      layers.29.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[270/291]     layers.29.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[271/291]     layers.29.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[272/291]     layers.29.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[273/291]            layers.29.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[274/291]        layers.30.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[275/291]        layers.30.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[276/291]        layers.30.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[277/291]        layers.30.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[278/291]      layers.30.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[279/291]     layers.30.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[280/291]     layers.30.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.018 0.032 0.051 0.076 0.104 0.128 0.137 0.128 0.104 0.076 0.051 0.032 0.018 0.022 \n",
            "[281/291]     layers.30.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[282/291]            layers.30.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[283/291]        layers.31.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[284/291]        layers.31.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[285/291]        layers.31.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[286/291]        layers.31.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[287/291]      layers.31.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[288/291]     layers.31.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[289/291]     layers.31.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.021 0.018 0.031 0.050 0.075 0.104 0.130 0.140 0.130 0.104 0.075 0.050 0.031 0.018 0.021 \n",
            "[290/291]     layers.31.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[291/291]            layers.31.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "llama_model_quantize_internal: model size  = 13133.55 MB\n",
            "llama_model_quantize_internal: quant size  =  4104.93 MB\n",
            "llama_model_quantize_internal: hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "\n",
            "main: quantize time = 178732.41 ms\n",
            "main:    total time = 178732.41 ms\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### （可选）测试量化模型解码\n",
        "至此已完成了所有转换步骤。\n",
        "我们运行一条命令测试一下是否能够正常加载并进行对话。\n",
        "\n",
        "FP16和Q4量化文件存放在./llama.cpp/zh-models/7B下，可按需下载使用。"
      ],
      "metadata": {
        "id": "DLkuRAo9Vkb1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && ./main -m ./zh-models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -p \"详细介绍一下北京的名胜古迹：\" -n 512"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tW-ep1BsVQtG",
        "outputId": "0706c974-127e-4f21-be6b-d71ea4fb989b"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "main: seed = 1681467955\n",
            "llama.cpp: loading model from ./zh-models/7B/ggml-model-q4_0.bin\n",
            "llama_model_load_internal: format     = ggjt v1 (latest)\n",
            "llama_model_load_internal: n_vocab    = 49954\n",
            "llama_model_load_internal: n_ctx      = 512\n",
            "llama_model_load_internal: n_embd     = 4096\n",
            "llama_model_load_internal: n_mult     = 256\n",
            "llama_model_load_internal: n_head     = 32\n",
            "llama_model_load_internal: n_layer    = 32\n",
            "llama_model_load_internal: n_rot      = 128\n",
            "llama_model_load_internal: ftype      = 2 (mostly Q4_0)\n",
            "llama_model_load_internal: n_ff       = 11008\n",
            "llama_model_load_internal: n_parts    = 1\n",
            "llama_model_load_internal: model size = 7B\n",
            "llama_model_load_internal: ggml ctx size =  59.11 KB\n",
            "llama_model_load_internal: mem required  = 5896.99 MB (+ 1026.00 MB per state)\n",
            "llama_init_from_file: kv self size  =  256.00 MB\n",
            "\n",
            "system_info: n_threads = 40 / 40 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | \n",
            "sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000\n",
            "generate: n_ctx = 512, n_batch = 8, n_predict = 512, n_keep = 0\n",
            "\n",
            "\n",
            "\u001b[33m 详细介绍一下北京的名胜古迹：\u001b[0m\n",
            " 故宫：明、清两代皇室，御花园及八达门大街。 宫殿内有大量文物珍品； [end of text]\n",
            "\n",
            "llama_print_timings:        load time =   717.01 ms\n",
            "llama_print_timings:      sample time =    48.97 ms /    32 runs   (    1.53 ms per run)\n",
            "llama_print_timings: prompt eval time =   680.93 ms /    11 tokens (   61.90 ms per token)\n",
            "llama_print_timings:        eval time =  4490.00 ms /    31 runs   (  144.84 ms per run)\n",
            "llama_print_timings:       total time =  5461.05 ms\n"
          ]
        }
      ]
    }
  ]
}